We just saw that the main challenge in Kruskal's is efficiently detecting if adding an edge creates a cycle. The most effective way to solve this is with a specialized data structure called the Disjoint Set Union (DSU), also known as a Union-Find data structure.
- A DSU tracks a collection of disjoint (non-overlapping) sets and provides two core operations:
- find(item): Determines which set an item belongs to by finding the "representative" or root of that set.
- union(item1, item2): Merges the two sets containing
item1anditem2.
- In Kruskal's, we treat each vertex as its own set initially. For each edge
(u, v)we consider:- We check if
find(u)is the same asfind(v). - If they are the same,
uandvare already in the same component. Adding the edge would create a cycle, so we discard it. - If they are different, we add the edge to our MST and perform a
union(u, v)to merge their components.
- We check if
- The total time complexity for Kruskal's is dominated by the initial edge sort, making it $O(E \log E)$.
- The DSU operations, with optimizations, are so fast they are considered nearly constant time on average.
Python: DSU with Path Compression & Union by Size
1class DSU:
2 def __init__(self, n):
3 # parent[i] < 0 means i is a root, and -parent[i] is its size
4 self.parent = [-1] * n
5
6 def find(self, i):
7 if self.parent[i] < 0:
8 return i
9 # Path compression: set parent directly to the root
10 self.parent[i] = self.find(self.parent[i])
11 return self.parent[i]
12
13 def union(self, i, j):
14 root_i = self.find(i)
15 root_j = self.find(j)
16 if root_i != root_j:
17 # Union by size: attach smaller tree to root of larger tree
18 if self.parent[root_i] < self.parent[root_j]:
19 self.parent[root_i] += self.parent[root_j]
20 self.parent[root_j] = root_i
21 else:
22 self.parent[root_j] += self.parent[root_i]
23 self.parent[root_i] = root_j
24 return True
25 return False
26
27if __name__ == '__main__':
28 # Vertices: 0=A, 1=B, 2=C, 3=D
29 dsu = DSU(4)
30 print(f"Initial parent array: {dsu.parent}")
31 dsu.union(0, 1) # Union A and B
32 print(f"After union(A, B): {dsu.parent}")
33 dsu.union(2, 3) # Union C and D
34 print(f"After union(C, D): {dsu.parent}")
35 dsu.union(0, 2) # Union sets {A,B} and {C,D}
36 print(f"After union(A, C): {dsu.parent}")